AITopics | Southern Region

Collaborating Authors

Southern Region

MAD-Fact: A Multi-Agent Debate Framework for Long-Form Factuality Evaluation in LLMs

Ning, Yucheng, Lin, Xixun, Fang, Fang, Cao, Yanan

arXiv.org Artificial IntelligenceOct-30-2025

The widespread adoption of Large Language Models (LLMs) raises critical concerns about the factual accuracy of their outputs, especially in high-risk domains such as biomedicine, law, and education. Existing evaluation methods for short texts often fail on long-form content due to complex reasoning chains, intertwined perspectives, and cumulative information. To address this, we propose a systematic approach integrating large-scale long-form datasets, multi-agent verification mechanisms, and weighted evaluation metrics. We construct LongHalluQA, a Chinese long-form factuality dataset; and develop MAD-Fact, a debate-based multi-agent verification system. We introduce a fact importance hierarchy to capture the varying significance of claims in long-form texts. Experiments on two benchmarks show that larger LLMs generally maintain higher factual consistency, while domestic models excel on Chinese content. Our work provides a structured framework for evaluating and enhancing factual reliability in long-form LLM outputs, guiding their safe deployment in sensitive domains.

large language model, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2510.22967

Country:

Asia > China > Beijing > Beijing (0.04)
Europe > Middle East > Malta > Southern Region > Southern Harbour District > Luqa (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.87)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Cheems: A Practical Guidance for Building and Evaluating Chinese Reward Models from Scratch

Wen, Xueru, Lou, Jie, Li, Zichao, Lu, Yaojie, Yu, Xing, Ji, Yuqiu, Xu, Guohai, Lin, Hongyu, He, Ben, Han, Xianpei, Sun, Le, Zhang, Debing

arXiv.org Artificial IntelligenceMar-1-2025

Reward models (RMs) are crucial for aligning large language models (LLMs) with human preferences. However, most RM research is centered on English and relies heavily on synthetic resources, which leads to limited and less reliable datasets and benchmarks for Chinese. To address this gap, we introduce CheemsBench, a fully human-annotated RM evaluation benchmark within Chinese contexts, and CheemsPreference, a large-scale and diverse preference dataset annotated through human-machine collaboration to support Chinese RM training. We systematically evaluate open-source discriminative and generative RMs on CheemsBench and observe significant limitations in their ability to capture human preferences in Chinese scenarios. Additionally, based on CheemsPreference, we construct an RM that achieves state-of-the-art performance on CheemsBench, demonstrating the necessity of human supervision in RM training. Our findings reveal that scaled AI-generated data struggles to fully capture human preferences, emphasizing the importance of high-quality human supervision in RM development.

arxiv, preprint, zhang, (17 more...)

arXiv.org Artificial Intelligence

2502.17173

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > China > Beijing > Beijing (0.04)
Europe > Middle East > Malta > Southern Region > Southern Harbour District > Luqa (0.04)
(3 more...)

Genre:

Research Report > New Finding (0.48)
Instructional Material > Training Manual (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.97)

Add feedback

The Best of Both Worlds: a Framework for Combining Degradation Prediction with High Performance Super-Resolution Networks

Aquilina, Matthew, Ciantar, Keith George, Galea, Christian, Camilleri, Kenneth P., Farrugia, Reuben A., Abela, John

arXiv.org Artificial IntelligenceNov-9-2022

To date, the best-performing blind super-resolution (SR) techniques follow one of two paradigms: A) generate and train a standard SR network on synthetic low-resolution - high-resolution (LR - HR) pairs or B) attempt to predict the degradations an LR image has suffered and use these to inform a customised SR network. Despite significant progress, subscribers to the former miss out on useful degradation information that could be used to improve the SR process. On the other hand, followers of the latter rely on weaker SR networks, which are significantly outperformed by the latest architectural advancements. In this work, we present a framework for combining any blind SR prediction mechanism with any deep SR network, using a metadata insertion block to insert prediction vectors into SR network feature maps. Through comprehensive testing, we prove that state-of-the-art contrastive and iterative prediction schemes can be successfully combined with high-performance SR networks such as RCAN and HAN within our framework. We show that our hybrid models consistently achieve stronger SR performance than both their non-blind and blind counterparts. Furthermore, we demonstrate our framework's robustness by predicting degradations and super-resolving images from a complex pipeline of blurring, noise and compression.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.3390/s23010419

2211.05018

Country:

Europe > Middle East > Malta > Eastern Region > Northern Harbour District > Msida (0.04)
North America > United States (0.04)
Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)
(3 more...)

Genre:

Overview (0.92)
Research Report > New Finding (0.67)

Industry: Information Technology (0.34)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(5 more...)

Add feedback

Malta: The Innovation Island AIBC Summit

#artificialintelligenceOct-21-2019, 09:43:27 GMT

If you look at the past four years we've been enjoying substantial economic growth, in order for the economy to be resilient to external shocks we have to continue to diversify and explore new niches to sustain our economic growth. So for this reason we have delved into niche economic areas. We started 2 years ago with Blockchain and we've been attracting significant investment to our island, not only in terms of crypto but also in other areas of technological developments. Now we're seeing new development in companies that are investing in technology and coming here to work and operate from Malta. We're also seeing a spill-over effect, such as companies from the iGaming industry who are producing new products supported by blockchain technology.

artificial intelligence, innovation island aibc summit, malta, (15 more...)

#artificialintelligence

Country:

Europe > Middle East > Malta > Southern Region > Southern Harbour District > Luqa (0.05)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.05)
Africa > Middle East > Libya (0.05)

Industry:

Information Technology (1.00)
Government (0.97)
Law (0.71)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.30)

Add feedback